21 Scientific misconduct and the reproducibility crisis
Learning Objectives
After completing this tutorial you should be able to
- Define three consecutive steps in data analysis/methods as
- Acquiring data
- Processing data
- Analyzing data
- Compare and contrast effects of using programs such as excel/other point & click programs compared to scripting languages for data analysis on reproducibility
- Recognize that employing tools for analysis differ in how reliable, accessible, and verifiable they are and how this limits how conducive they are for reproducible research.
Download the directory for this project here, make sure the directory is unzipped and move it to your bi328 directory. You can open the Rproj for this module either by double clicking on it which will launch Rstudio or by opening Rstudio and then using File > Open Project or by clicking on the Rproject icon in the top right of your program window and selecting Open Project.
There should be a file named 21_reproducibility-research.qmd in that project directory. Use that file to work through this tutorial - you will hand in your rendered (“knitted”) quarto file as your homework assignment. So, first thing in the YAML header, change the author to your name. You will use this quarto document to record your answers. Remember to use comments to annotate your code; at minimum you should have one comment per code set1 you may of course add as many comments as you need to be able to recall what you did. Similarly, take notes in the document as we discuss discussion/reflection questions but make sure that you go back and clean them up for “public consumption”.
1 You should do this whether you are adding code yourself or using code from our manual, even if it isn’t commented in the manual… especially when the code is already included for you, add comments to describe how the function works/what it does as we introduce it during the participatory coding session so you can refer back to it.
21.1 What even are these ‘Methods’ you speak of?
One framework to categorize components of the “methods” of a study is to place them into three steps that build on each other.
- Acquire data
- Process data
- Analyze data
21.2 Tools of the trade
Tiny history lesson Lotus123 is a what launched IBM Personal Computers into offices around the world …
My very first PC adventures included running Lotus123 off of big floppy disks. Microsoft developed Excel and the Office package and quickly PCs where not only in offices and homes around the world. Spreadsheet applications were initially focused mainly on managing and organizing data (think HR department and payroll) but increasingly complicated calculations were possible and soon Excel sneakily made its way into scientific research.
Today, the tools used by scientists to analyze their data vary from highly specialized tools for very specific tasks (each with their own required data format), to large software packages like SAS and STATA. Many of these tools have GUIs (graphic user interfaces) and are what is frequently referred to as “point & click” or “WYSIWYG”2.
2 WYSISWG = What you see is what you get. Compare this to WYWIWYG = what you want is want you get.
21.3 Impacts of Flawed Data Analysis
21.3.1 Case Study 1
21.3.2 Case Study 2
Before today’s class you should have completed these readings:
- Shariff et al 2016 “What is the association between religious affiliation and children’s altruism?”
- Retraction notices of Decety et al. 2015 “RETRACTED: The Negative Association between Religiousness and Children’s Altriusm across the World”
- Dig a little deeper: How a study based on a typo made news everywhere - and the retraction didn’t.
21.3.3 Case Study 3
Before today’s class you should have read key sections from Herndon et al. 2014 “Does high public debt consistently stifle economic growth? A critique of Reinhart & Rogoff”:
- Section 1 (Introduction)
- Section 2 (Public impact and policy relevance)
- Introduction to section 3. (Replication) + Section Headers
- Section 4 (Conclusion)